Goto

Collaborating Authors

 wide feedforward


Wide Feedforward or Recurrent Neural Networks of Any Architecture are Gaussian Processes

Neural Information Processing Systems

Wide neural networks with random weights and biases are Gaussian processes, as observed by Neal (1995) for shallow networks, and more recently by Lee et al.~(2018) and Matthews et al.~(2018) for deep fully-connected networks, as well as by Novak et al.~(2019) and Garriga-Alonso et al.~(2019) for deep convolutional networks. We show that this Neural Network-Gaussian Process correspondence surprisingly extends to all modern feedforward or recurrent neural networks composed of multilayer perceptron, RNNs (e.g. LSTMs, GRUs), (nD or graph) convolution, pooling, skip connection, attention, batch normalization, and/or layer normalization. More generally, we introduce a language for expressing neural network computations, and our result encompasses all such expressible neural networks. This work serves as a tutorial on the \emph{tensor programs} technique formulated in Yang (2019) and elucidates the Gaussian Process results obtained there. We provide open-source implementations of the Gaussian Process kernels of simple RNN, GRU, transformer, and batchnorm+ReLU network at github.com/thegregyang/GP4A. Please see our arxiv version for the complete and up-to-date version of this paper.


Reviews: Wide Feedforward or Recurrent Neural Networks of Any Architecture are Gaussian Processes

Neural Information Processing Systems

This review has 2 parts. The first part is my review of the paper as a standalone paper. The second part is a meta-commentary unifying my reviews for both this paper and "Neural Tangent Kernel for Any Architecture". Part 1 This paper demonstrates that infinitely-wide architectures made from a range of building blocks are Gaussian processes. Fundamentally, the paper seems to have two core contributions. This paper is a clean, elegant and logical next step in an important research direction.


Reviews: Wide Feedforward or Recurrent Neural Networks of Any Architecture are Gaussian Processes

Neural Information Processing Systems

The paper presents a method for collapsing a wide range of operations (convolution, pooling, batchnorm, attention, gating, as well as the inner products for the actual GP Kernel computation) into the matrix multiplication / nonlinearity / linear combination framework; and also a mean field theory of tied weights, which allows a rigorous extension to RNNs as well as a rigorous integration of the forward and backward pass. The results are novel and interesting. This paper had strong overlap with another paper (that was clearly identified by the authors in both submissions), and so the discussion of the tw o papers took place together.


Wide Feedforward or Recurrent Neural Networks of Any Architecture are Gaussian Processes

Neural Information Processing Systems

Wide neural networks with random weights and biases are Gaussian processes, as observed by Neal (1995) for shallow networks, and more recently by Lee et al. (2018) and Matthews et al. (2018) for deep fully-connected networks, as well as by Novak et al. (2019) and Garriga-Alonso et al. (2019) for deep convolutional networks. We show that this Neural Network-Gaussian Process correspondence surprisingly extends to all modern feedforward or recurrent neural networks composed of multilayer perceptron, RNNs (e.g. LSTMs, GRUs), (nD or graph) convolution, pooling, skip connection, attention, batch normalization, and/or layer normalization. More generally, we introduce a language for expressing neural network computations, and our result encompasses all such expressible neural networks. This work serves as a tutorial on the \emph{tensor programs} technique formulated in Yang (2019) and elucidates the Gaussian Process results obtained there.


Wide Feedforward or Recurrent Neural Networks of Any Architecture are Gaussian Processes

Yang, Greg

Neural Information Processing Systems

Wide neural networks with random weights and biases are Gaussian processes, as observed by Neal (1995) for shallow networks, and more recently by Lee et al. (2018) and Matthews et al. (2018) for deep fully-connected networks, as well as by Novak et al. (2019) and Garriga-Alonso et al. (2019) for deep convolutional networks. We show that this Neural Network-Gaussian Process correspondence surprisingly extends to all modern feedforward or recurrent neural networks composed of multilayer perceptron, RNNs (e.g. LSTMs, GRUs), (nD or graph) convolution, pooling, skip connection, attention, batch normalization, and/or layer normalization. More generally, we introduce a language for expressing neural network computations, and our result encompasses all such expressible neural networks. This work serves as a tutorial on the \emph{tensor programs} technique formulated in Yang (2019) and elucidates the Gaussian Process results obtained there.